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Abstract 

Variable-intensity astronomical sources are the result of complex and often extreme physical 
processes. Abrupt changes in source intensity are typically accompanied by equally sudden 
spectral shifts, i.e., sudden changes in the wavelength distribution of the emission. This article 
develops a method for modeling photon counts collected from observation of such sources. We 
embed change points into a marked Poisson process, where photon wavelengths are regarded as 
marks and both the Poisson intensity parameter and the distribution of the marks are allowed 
to change. To the best of our knowledge this is the first effort to embed change points into a 
marked Poisson process. Between the change points, the spectrum is modeled non-parametrically 
using a mixture of a smooth radial basis expansion and a number of local deviations from the 
smooth term representing spectral emission lines. Because the model is over parameterized we 
employ an £i penalty. The tuning parameter in the penalty and the number of change points 
are determined via the minimum description length principle. Our method is validated via a 
series of simulation studies and its practical utility is illustrated in the analysis of the ultra-fast 
rotating yellow giant star known as FK Com. 
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1 Introduction 


Astronomical sources that exhibit temporal variability or periodicity are among the most scientifi¬ 
cally rich in the observed Universe. Variation in source intensity can result from rotations, eclipses, 
magnetic activity cycles, or the turbulent flows of matter into deep gravitational wells. Massive 
stellar explosions known as super novae, for example, appear as abrupt peaks in the electromagnetic 
radiation emitted from a source. They sometime produce a narrow beam of intense high-energy ra¬ 
diation that appears as 7 -ray bursts from Earth—although there are other sources of 7 -ray bursts. 
These bursts originate in distant galaxies, can last from milliseconds to minutes, and are among 
the most energetic events that occur in the Universe. There are many less dramatic objects that 
produce temporal variability; they include signals with repeated peaks and troughs that may or 
may not follow a predictable periodic pattern. A variable star, for example, exhibits fluctuating 
emission that can be due to a natural expansion and contraction in its radius as it evolves (a 
so-called pulsating variable star) or to eruptions in its coronae such as flares or mass ejections. In 
extreme cases, superflares can erupt producing millions of times more energy than a typical solar 
flare; if such a flare occured on the Sun it could destroy the Earth’s ozone layer and cause mass 
planetary extinction. Although the mechanism is not well understood, super flares are most likely 


to occur on fast rotating stars (McKee, 2012). 

In this article we develop statistical methods that enable us to identify changes in observed 
emission from astronomical sources, such as those associated with massive energetic events in the 
coronae of variable stars. Dramatic shifts in intensity of these sources are typically accompanied by 
changes in the distribution of the wavelength of the photons emitted, known as the spectrum of the 
source. Thus, our methods aim to take advantage of changes both in the intensity of the emission 
and in its wavelength distribution. We illustrate our methods by applying them to high-energy 
observations of the flaring star known as EK Com. 


Data collection in high-energy astrophysics High-energy astrophysics is the study of electromag¬ 
netic radiation in the ultraviolet. X-ray, and 7 -ray bands. The production of such high-energy 
photons requires temperatures of millions of degrees and signals the release of deep wells of stored 
energy such as those in very strong magnetic fields, extreme gravity, explosive nuclear forces, and 
shock waves in hot plasmas. High-energy detectors have much in common in terms of their statis¬ 
tical properties (and much that distinguishes them from detectors in other energy regimes). In this 
article we focus on statistical methods that are appropriate for high-energy astrophysics. 

Astronomical data from high-energy observatories are usually obtained as a list of photons; the 
list records the two-dimensional direction from whence each photon arrived, the time at which it 
was recorded, and its wavelength (and hence its energy). Owing to the intrinsic resolutions of high- 
energy photon telescope/detector systems, each of these four attributes is inherently discrete and 
the data can be compiled into a four-way table of photon counts. Although each attribute is subject 
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to uncertainty (also inherent in the telescope/detector systems, e.g., the true direction of arrival 
is distorted by the point spread function), the quality of the resulting datasets is unprecedented 
in the history of astronomical observations. Nonetheless the remarkable four-way tables of photon 
attributes are largely an untapped resource. This is mostly due to a lack of principled statistical 
methods that can be used to detect and characterize astronomical sources in high-dimensional 
spaces. This is especially true of high-spectral-resolution grating data (e.g., data obtained using 
the HETGS-I-ACIS-S configuration of the Chandra X-ray Observatory), where a self-selected sample 
of interesting objects are observed for long durations (often for > 100 ksec) with Doppler resolutions 
of ~100 km s“^, and with an intrinsic temporal resolution that varies from one millisecond to three 
seconds. Because the locations of the objects we study in this article do not change, we ignore 
directional information and focus on the two-way table of photon counts indexed by time and 
wavelength. The one-way marginal table of wavelengths is called the (observed) spectrum and that 
of the times is the (observed) light curve of the source. 


Change points in Poisson processes From a statistical point of view, sudden changes in an un¬ 
derlying physical process are modeled via change points. In a parametric model, for example, the 
value of the parameter is allowed to change at one or more time points during the observation. 
The periods of time where the parameter is fixed are called the regimes of the process. We might 
consider testing for a single change point, fitting the model with a given number of change points, 
or estimating the number of change points. Given that the data in high-energy astrophysics are 
photon counts, we focus on change points in non-homogeneous Poisson processes. Because the 
wavelength (and direction) of each photon are recorded, we can either model them using a marked 
Poisson process or count the events in each of a number of wavelength bins and model them as a 
multivariate Poisson process. Typical detectors have discrete temporal and spectral resolution so 
it is natural to use photon counts in time cross wavelength bins, i.e., to use discrete-time processes. 
The methods that we propose are designed specihcally for two-way tables of Poisson counts of this 
sort. 

The complexity of any statistical model naturally increases with the number of change points. 
Thus to avoid overfitting, methods must penalize complexity. Bayesian prior distributions and 
marginalization techniques are natural avenues for encouraging parsimony, for example in that 


they inherently embody Occam’s Razor in model selection (e.g., MacKay, 2003, chapter 28). It 
is not surprising then that there is an extensive literature on Bayesian methods for change point 


models. Raftery and Akman (1986) is an early contribution for count data generated according 
to a homogeneous Poisson process until some unknown time when the intensity of the process 
changes. They propose computing the joint posterior distribution of the two Poisson rates and the 


single change point along with a Bayes Factor to test for existence of the change point; see Garlin 


et al. (1992) and Moreno et al. (2005) for derivation of hierarchal and intrinsic prior distributions. 


respectively, in the single change point Poisson setting. Green (1995) generalizes this approach by 
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using a continuous time model, allowing for multiple regimes, putting a prior distribution on the 


number of regimes, and fitting this number. Lai and Xing (2011), on the other hand, use a discrete 


time model with a Bernoulli process governing the change in regimes. This approach follows Chib 


(1998) who embedded a latent discrete-time discrete-state Markov process for the regime index 


into a multi-level Bayesian model. This formulation has been applied for inhomogeneous Poisson 


processes, for example, to model change points in the parameters of a log-linear model (Park 2010), 
see 


Park et al. (2012) for a related approach. 


Perhaps the most popular change-point method for astronomical count data is known as Bayesian 


Blocks (Scargle, 1998). It starts by splitting the time interval into two, assuming a homogeneous 


Poisson process on each, and computing the posterior odds of this model relative to a homogeneous 
process on the entire interval. The change point is chosen by maximizing the posterior odds and 
is accepted if the odds favor this model. This process continues recursively on each subinterval 


until no more change points are added. Scargle et al. (2013) improved the method by achieving 


global optimization over the change points while adding the capacity to handle gaps in the data, 
variable exposure, piecewise linear and piecewise exponential Poisson intensities, multivariate data, 
the analysis of variance, data on the circle, etc. Bayesian Blocks does not, however, address changes 
in the spectrum at the change points. 

Non-Bayesian methods for change-point detection have also been proposed for non-homogeneous 


Poisson processes. Akman and Raftery (1986) and Worsley (1986) are early contributions. Both 


papers developed methods for testing for the presence of a change-point in a Poisson process. 
These methods follow classical frequentist arguments and also provide interval estimates for any 


detected change-points. Loader (1992) considered using log-linear models for non-homogeneous 


Poisson processes. In particular likelihood ratio tests are derived to choose a “best” model amongst 


different change-point and log-linear models for the observed data. In Mei et al. (2011) methods 


are proposed for detecting changes in Poisson rate after the effects of population size are taken into 
account. These methods are based on a generalized likelihood ratio, weighted likelihood ratio, and 


adaptive thresholding. More recently, Shen and Zhang (2012) applied the modified BIC (mBIC) of 
Zhang and Siegmund (2007) to detect change-points in non-homogeneous Poisson processes. The 


mBIC can be viewed as a penalized likelihood model selection criterion that is tailored to estimating 
the number and locations of the change points. 


Change points in marked Poisson processes Despite the extensive literature on change points in 
Poisson processes, there is little on change points in marked Poisson processes. Methods have been 
developed to test for non-homogeneity in marked Poisson processes, for example via a scan statistic 


that is computed in a sliding time interval (Chan and Zhang, 2007). In high-energy astrophysics a 


common strategy is to compute a simple summary statistic describing the wavelength distribution 
in each time bin and to visually inspect a plot of the statistic as a function of time. A typical 
choice of statistics is the so-called hardness ratio which is the ratio of the photon count observed 
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above a given wavelength threshold to that below the threshold; see Park et al. (2006) for a small- 


sample statistical treatment of hardness ratios. Plotting hardness ratios against time is an informal 
exploratory technique. The primary goal of this article is to provide a change point method for 
marked Poisson processes that is inspired by and tailored to a specific problem in high-energy 
astrophysics. 


Methods for detecting change points There is a wide range of statistical approaches for detecting 
statistical change points. For the classical simple case of piecewise constant signals with additive 


noise, for example, Yao (1988) studied the use of the Schwartz criterion and showed it to be 


consistent. This criterion can be interpreted as the classical Bayesian information criterion (BIC) if 
the location of a change point is treated as a free parameter (or as a dimension of the model). In this 
case, however, the likelihood functions violate the regularity conditions necessary for the classical 


BIC, so both criteria can likely be improved. To circumvent this issue, Zhang and Siegmund (2007) 


provide an ingenious method to approximate the Bayes factor and developed the above-mentioned 
mBIC for change point detection which has been shown to be consistent in various settings. More 


recently. 

Ninomiya 

(2014 

derived a specialized Akaike information criterion (AIC) for change point 

estimation. Results from 

Aue and Lee 

(2011 

Section 3.3), however, can be used to show that this 


AIC method is not consistent. 

In this article we focus on the minimum description length (MDL) principle (Rissanen 1989| 


2007), which can also be viewed as a penalized likelihood approach to model selection. A brief 


introduction of MDL is given in Section |3.3[ One of its earliest applications to change point esti¬ 


mation is the image segmentation work of Leclerc (1989), where the corresponding MDL criterion 


can be easily reduced to the classical one-dimensional change point problem setting; see also |L 


ee 


(1998) for an extension to correlated noise. Other successful MDL applications to change point 


problems include discontinuity detection in nonparametric regression (Lee 2002a), structural break 


detection in nonstationary time series models ( 

Davis and Yau 

2013 

Yau et al. 

2015 

), and change 

point detection in quantile modeling ( 

Aue et al. 

2014 

). Under mild regularity assumptions, these 


MDL solutions can be shown to be consistent. 

For many change-point problems simpler than the one considered in this paper, the major 


penalty terms of mBIC and MDL are remarkably similar; e.g., compare Equation (8) of Zhang and 


Siegmund (2007) and Equation (6.2) of Lee (1997). This provides reassuring evidence that both 


mBIC and MDL are reliable methods for change point detection. 


Strategy and outline We pursue a non-parametric penalized-likelihood strategy, using MDL to 
optimize the tuning parameters of the penalty function and to fit the number of change points. 
Between the change points, we assume a homogeneous Poisson process where the spectrum includes 
a broad smooth term that is modeled using a radial basis expansion. In particular the bases we 
use are cubic polynomials, which provide a good balance between model flexibility and ease of 
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implementation. We also include a number of single-bin local deviations from this extended smooth 
model. Physically, these deviations correspond to spectral emission lines that are the result of the 
production of photons of a particular wavelength as electrons fall to the lower energy shells of a 
specific ion. Both the radical basis expansion and the local deviations are over parameterized in 
that we expect most of their coefficients and intensities (respectively) to be zero. Thus, we employ 
an l\ penalty on both. An R package “Automark” implementing our proposed methodology is 
available to interested readers at https://github.coni/astrostat/Autoniark. 

Some simple spectral models such as power laws and exponential absorption can be formulated 


as log-linear models (van Dyk et al ., 2001) and thus in principle could be embedded into a temporal 


model with change points for their parameters, similar to the proposal of Park (2010). We take a 
different approach by using ffexible non-parametric spectral models. This allows us to account for 
both blurring of the recorded energies and background contamination of the photon counts, and 


to apply a single model irrespective of the particular shape of the source spectra. (See van Dyk 


and Kari^ (2004) for a review of the source and data-distortion models used in high-energy spectral 


analysis.) This strategy is contrary to the typical approach in high-energy spectral analysis where 
physics-based parametric models are preferred because their parameters have direct scientific mean¬ 
ing. Our primary goal here, however, is to identify the change points. The spectra corresponding 
to each regime within the process can be fit using physics-based models in a secondary analysis. 

This article is organized into seven sections. Section describes the detectors used for data 
collection, the relevant astrophysical models, and basic notation for our statistical approach. We 
formalize our model for the homogeneous Poisson processes within each regime of the overall process 
in Section]^ and describe how we embed this into the change-point model in Section]^ Numerical 
results including a set of simulation studies and an application to the yellow giant FK Com appear 
in Sections and respectively. Discussion appears in Section and details of the derivation of 
our MDL criterion in an Appendix. 


2 Astrophysical Background and Statistical Notation 

Suppose that we observe photon counts in an x J rectangular array of equally-sized wavelength- 
cross-time bins, where yij is the photon count in the bin with wavelength range [wi—6y^/2, Wi+5u]/‘2) 
and temporal range [tj — 6t/2, tj + 6t/2), for i = 1,N and j = 1,..., J. Although this array 
may be compiled using the natural resolution of the detector, with high-resolution data it may be 
computationally advantageous to combine the natural bins in order to obtain a lower resolution 
array of photon counts. This strategy is illustrated and discussed in Section In any case, the 
expected count in each bin is ideally proportional to the brightness of the astronomical source 
integrated over its time and wavelength ranges. Because of detector effects, however, a photon 
with wavelength w has a distribution of possible recorded energies, w'. This distribution is called 
the detector response function and denoted R{w, w'). We apply our methods to data collected using 


6 











a grating that, like a prism, spatially separates photons as a function of their wavelength. With 
grating data, wavelength is measured very accurately and we can ignore measurement uncertainty 
and treat R{w,w') as a Dirac delta function. Another detector effect arises because the sensitivity 
of the detector varies with photon wavelength, an effect that is quantified via the so-called effective 
area of the detector and denoted by A{w). The effective area quantifies characteristics of the 
telescope mirrors that do not relate to the nature of the source and is typically treated as known. 
(See, however, Lee et al. (2011) and Xu et al. (2014) for development of Bayesian statistical methods 
that account for uncertainty in A(tc).) 

Using a Poisson model for the photon counts and letting X{tj,Wi) be the expected count per 
unit time and per unit wavelength averaged over the bin centered at {tj, Wi), 


Uij Poisson{(5i • 6^ ■ X{tj,Wi) ■ A{wi)}. 


( 1 ) 


In practice, we may combine data from K detectors, each with its own effective area. Letting 
Ai{w),... ,Ak{w) denote the K effective areas, the total counts across the K detectors can be 
modeled 


K 


Yij — Uijk ) 


k=l 


where 


Vijk Poisson{(5t • 6^, ■ X{tj, Wi) ■ Ak{wi)] 

are the counts recorded with detector k. Because the yijk are independent across k, we have 

r K \ 


Yij ~ Poisson < • X{tj,Wi) E Ak{wi) > . 


k=l 


The goal of this article is to infer the properties of X{tj,Wi) using the observed photon counts, 
{Yij}. We are particularly interested in detecting statistically significant changes in {Yij} over 
time. In other words, we would like to determine if there are any change point, tt, such that 
X{tj,Wi)\^^ differs from X{tj,Wi)\^^ If there are, we aim to estimate the number of change 
points and their values. 

For ease of presentation, we develop our estimation procedure in two stages. First we develop 
a flexible model for a time-homogeneous source spectrum. That is we assume that there are no 
change points and that X{tj,Wi) is constant in tj. In the second stage we introduce change points 
and allow X{tj,Wi) to change over time. 
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3 A Model for the Homogeneous Poisson Processes 

3.1 A nonparametric spectral model 


Because there are no change points in our time-homogeneous Poisson model, we drop tj from the 
argument of X{tj,Wi) and write X{tj,Wi) = X{wi) in this section. Also, we denote the expectation 
of Yij by That is, n{wi) = s{wi)X{wi) with s(rcj) = 6t ■ 6^ ^k{wi) being a completely 

specified function. We represent the expected counts due to the underlying physical process by A 
and expected detector counts by /r; ^ is adjusted for varying bin sizes and effective areas. 

The parameter of scientific interest is X{wi). For the astronomical sources that we study X{wi) 
is mostly smooth with a few emission lines overlaid and we adopt the model 

M 

XjWj) = f{Wi) amgni{Wi,Tm), (2) 

m=l 


where f{w) is a nonnegative smooth function, M is the number of emission lines, am > 0 is 
the magnitude of emission line m, and gmiw,Tm) models the shape of the emission line centered 
at Tm- Common choices for gm{W)Tm) include Gaussian density functions with small variances, 
Lorentzian density functions, and delta functions (e.g., van Dyk et al.. 2006, and the references 
therein). Here we assume that each of the emission lines locates completely within a single bin, 
although our framework can be easily modified for cases when some emission lines span multiple 
bins. For astrophysical reasons, we also assume that each Um > 0, that is each emission line 
constitutes a positive deviation from f{w), although in principle the am may also be negative, so 
long as X{wi) > 0. 

If an individual spectral line is located in a single bin, the line profile, gm{w, Tm)-, is an indicator 
function for the interval of width 5^ centered at Tm £ {rci, • ■ • that is, an indicator function 

for one of the wavelength bins. The natural resolution of the bin counts does not allow modeling 
of gm{w, Tm) at any greater level of detail than this. Under this model for the emission lines, X{wi) 
in becomes 


M 

Kwi) = f{Wi) + ^ Ctmi 

m=l 


'^m ^ ^ '^m ~pr 


where I{A) denotes the indicator function for A and Tm £ {ir'i, • • •, Because of the nonnega¬ 
tivity constraint on X{'Wi), we introduce a log-link function. 


M , 

logA(u;i) = \ogf{wi) ^ olm^ f T 

m=l ^ 


m ~Tr — ^ '^m H TT 


(3) 


where a'm is the transformed am that ensures the equality in ([^ holds. 


To provide flexible modeling, we use a radial basis expansion (see, e.g., Buhmann 2003; Ruppert 
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et al.. 2003) to nonparametrically model log f{w), the smooth component of log A(i(;). Specifically, 


we use the P basis functions, (^i(u;),... ,Cp{w)), which are polynomials of power 3 with equally 
spaced knots, ki, , Kp_ 4 j^ Specifically, the basis functions are 


^i(u;) = 1 , ^ 2 {w) = w, = w^, C4{w) = w^, and 

Cp+ 4 {w) = \w - Kpf, p=l,...,P -4. 


Substituting these into Q yields 

p M 

logA(u;i) = '^^pip{wi) + ^ 

p=l m=l 

where /3p is the coefhcient of basis function p. 

Model @ is an example of a generalized semi-parametric model. Under this formulation, 
estimating \{wi) is equivalent to estimating P, (/3i,..., fip), M, and (a'^, ri,..., tm)- 


Tm - ^ <Wi <Tni + — 


(4) 


3.2 A lasso model for the emission lines 

A difficulty in fitting model Q is the estimation of (ri,..., tm), the locations of the emission lines. 
With M unknown, there is a total of 2^ possibilities for (ri,... ,tm)- However, as emission lines 
are relatively rare (i.e., sparse), we can take advantage of the celebrated £i penalty techniques (e.g.. 


Tibshirani, 1996) to provide a fast searching algorithm for (ri,... ,tm)- 

as 


We begin by rewriting S 


P N P 

logA(u;i) = '^Pp^piwi) + '^r]khiwi) = '^Pp^pim) + 

p=l k=l p=l 


(5) 


with Ik{w) = I{wk—Sw/2 < w < Wk + Sw/’^)- la this formulation r]i = 0 implies there is no emission 
line in the wavelength bin centered at Wi. There is a one-to-one correspondence between models Q 
and via 


M = 


JV 

\—^ r 1 ^ f 1 

and _ = |(wi,r/*) : / 0| 

i=l 


A major advantage of re-expressing model Q as Q is that, many of the rf = ( 771 ,... are 

zero in Therefore by imposing an penalty, we can achieve fast simultaneous selection and 
estimation of the nonzero elements among rj. 

We employ a common strategy for the smooth component, log/(u;i) = ™ (5)- 


^The gap between the first knot the left endpoint of the spectral range and the gap between the last knot and the 
right endpoint of the range are both slightly larger than the gaps between consecutive knots. 
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Namely, we specify a large value of P a priori and estimate /3 = (/3i,..., /3p)T under a penalty term 
that aims to avoid over-fitting. In this way, the coefficients of the the redundant functions among 
^i(rc),... ,Cp{w) are shrunk toward zero. This enables us to assume that P is known (and large). 

Penalized maximnm likelihood can be used to simultaneously estimate all of the unknown 
parameters in ([^, including those describing the smooth component and those describing the 
emission lines. 

For a Poisson random variable with expectation a > 0, we write the corresponding probability 
mass function as 

g(x; o) = —a + X log a — log(x!) ( 6 ) 

for any nonnegative integer x. The log-likelihood of a bin count Yij is 


^one(h)j,0) — , s(u?j)A(tCj)}, 


where X{wi) is parameterized in terms of 6 = (/3, 77) as in ([^. (This likelihood will be used within 
one homogenous time segement in Section]^ hence the subscript “one”.) Adding a penalty, we 
dehne the estimate of 6 as the maximizer of 


N J 


EE lone{Y^y,e)-j{p\\Bf3\\i + {l-p)\\v\\i}, 

i=l j=l 


(7) 


where D = diag{(0,0,0,0,1,..., 1)} is a diagonal matrix, and 7 > 0 and 0 < p < 1 are tuning 
parameters that determine the penalty on 0 . The first fonr diagonal elements of D are set to 
zero because there is no penalty for the inclusion of (/3i,..., 184 ) in the model. If 7 and p are 
specihed, maximization of ([^ is equivalent to lasso fitting of a generalized linear model. There are 


fast algorithms for this optimization problem (e.g., coordinate descent, Friedman et al.. 2010) and 
software is widely available; e.g., the R package glmnet. 


3.3 Tuning parameter selection using MDL 

The success of Q depends heavily on our ability to choose good values of 7 and p, which determine 
the number of emission lines and the number of basis functions nsed in the smooth component of 
the fitted model. To select 7 and p, we use the minimum description length (MDL) principle 
(jRissanen 1989, 2007). In short, MDL defines the best model as the one that achieves the highest 


compression rate of the data, T). In other words, the best model allows ns to store T> with the 
shortest codelength. A good statistical model and a good compression method share a common 
feature: they should be able to capture regularities and patterns hidden in the data. Therefore it is 
reasonable to expect that a model chosen by MDL to be a good statistical model; some successful 
examples for change point problems were provided in Section while for other applications can be 
fonnd in the later chapters of Griinwald et al. (2005[). There are various versions of MDL. We nse 
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the so-called two-part form of MDL to derive a model selection criterion for choosing 7 and p. 

Let CL(Z) be the codelength of Z: one can treat QL{Z) as the amount of memory needed to 
store Z. Similarly, let CL{Z\X) be the codelength of Z conditional on X^ that is, the codelength of 
Z with knowledge of X. Generally speaking, the two-part form of MDL decomposes the codelength 
of the data T) into two parts 

CL(2?) = CL(A?) + CL(P|Ad), (8) 


where the best fitting model M is defined to be the minimizer of CL(2?). In Appendixj^it is shown 
that when fitting model for large N the two-part MDL Q is asymptotically equal to 


mdlnuii(7,/9) 


N J 

i=l j=l 


0||olog(A^J) -Flog 



(9) 


where 0 = (/ 3 ,17) is the maximizer of Q given 7 and p, and the Iq norm, ||2;||o, denotes the number 
of nonzero elements in the vector 2 :. We choose the values of 7 and p that jointly maximize ©by 
evaluating mdlnuii(7, p) on a fine grid, using glmnet to compute 0(7, p) on a grid of values of 7 for 
each value of p. In many MDL applications, a term similar to the last one in © is negligible and 
hence omitted. Here, however, the number of unknown parameters {N + P) is not ignorable when 
compared to the number of observations and hence the last term in © is required. 

In summary, for the time-homogeneous model with no change points, the parameter estimate 
6 = (/3, r)) is obtained by maximizing © with 7 and p chosen as the joint minimizer of ©. Finally, 
the estimate for \{wi) is calculated as \{vji) = exp{^^^^ $p^p{wi) + Pi}- 


4 Modeling with Change Points 

4.1 Adding change points to the spectral model 

In this section, we allow the spectrum, X{tj,Wi), to change over time. Hence we reinstate the 
notation that emphasized the dependence on tj. We model X{tj,Wi) as piecewise constant as a 
function of tj. This involves partitioning the entire time interval [0, T) into several subintervals and 
independently modeling X{tj,Wi) as constant in tj in each subinterval using a model of the form 
given in ©. Again let Yij be the photon count in the bin centered at {wi,tj) and summed over the 
K detectors. We can formalize the change-point model for X{tj,Wi) as 

B 

X{tj,Wi) = '^Ib{tj)Xb{wi), (10) 

b=l 

where H > 1 is the number of time (sub)intervals, hitj) = < tj < Tib) identifies the time 

bins in interval b, ti — St/2 = ttq < ■ ■ ■ < ttb = tj + St/2 are the change points, and Xb{wi) is the 
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spectrum in time interval b. For ease of discussion, we refer to tti, ... as the change points, 

excluding the endpoints ttq and ttb- The within time-interval spectra are modeled as in ([^, i.e., 

p 

log \b{Wi) = ^ /3bpCp{Wi) + TJbi, 

P=1 

where l3b = {Pbi, ■ ■ ■, t^hpV and rjb = (%i,..., %Ar)T are the parameters for the time-interval-specific 
spectra. 

There is no gain in allowing the change points, (vri,..., 7rs_i), to take values other than bin 
breaks if there is no prior knowledge about the change points. Further, to maintain sufficient 
data within each time interval for acceptable estimation, |7rj — tTjI cannot be too small for any 
0 < i,j < B. We require that Itt* — ttjI > 55t in our numerical illustrations. This ensures that each 
time interval is at least five bins wide. 


4.2 Model selection using MDL 


In order to fit model (10), we must estimate the number of time intervals, B, the change points 


TT = (vTi, ..., ttb-iY , and the parameters for each interval, 0 = {6i, ..., 6b}-, where 6 b = (/3fe, r]b)- 
Once B and tt are specified, however, each Xb{wi) along with its tuning parameters, 7 = 76 and 
p = Pb, can be estimated individually using the method of Section]^ Thus, we aim to first estimate 
B, TV and then compute %{B,7v), pb{B,7v), and 6 b{%,pb) as in Section]^ for h = 1,...,B. This 
notation signifies the dependence of % and pb on {B,tv), and the dependence of 6 b on {%,Pb)- 
Similarly, we also write 0 and X{tj,Wi) as 0 ( 7 , p) and X {tj, wp B, tv, &{'j, p)}, respectively. 


The profile penalized (pp) loglikelihood function for B, tv under model (10) depends on the 
tuning parameters 7 = ( 71 ,..., 75 ) and p = (pi,..., ps) and can be written 


N J 

log£pp(5,7r;7,p) = '^'^q(Yij]s{wi)X^tj,Wi-,B,TV,&{'y,p)'^'^ , (11) 

i=i j=i 

with q the Poisson loglikelihood given in ([^. Because 7 and p can be computed as functions of 
B and tv, we can substitute j{B,tv) and p{B,Tv) into log£pp(B, tt; 7 , p). This is akin to profiling 
over 7 and p, except that •y{B, tv) and p{B, tv) are computed according to the MDL criterion 
rather than by maximizing log£pp(i?, tt; 7 , p). The result is the pseudo profile penalized (ppp) 
loglikelihood given by 


log£ppp(B,7r) = log£pp(B,7r;7(B,7r),p(B,7r)) 

N J B 

= EEE Ibitj)lone^yij'-,6b{B,Tv)'^ , (12) 

i=l j=l 6=1 


12 




where Oi)[B,tv) = 6 jj{y{B,7T), p{B,7T)'j. Because different values of B lead to different numbers of 


parameters in the model, we cannot estimate the parameters by maximizing (12). Instead we view 


this as a model selection problem and again use the MDL principle. In this way we can choose the 
“best” combination of B and tt, and then compute %{B, tt), pb{B, tt), and 0b{%, pb) for each time 
interval. We derive the MDL criterion in Appendix [B] and show that the best model can be found 
by minimizing 


B 


mdlfuii(B,7r) = - log£ppp(B, tt) + log B ^ log Cfe( 7 r) 


b=l 


B 


+ E 


6=1 


-|| 0 fe(B, 7 r)||olog{Arc 6 ( 7 r)}+log . , , 

2 V||T76(B,7r)||oyJ 


N 


(13) 


where Cb( 7 r) is the cardinality of {i : iTb-i < U < TTb} for b = 1,... ,B, i.e., Cb( 7 r) is the number of 
time bins in time interval b. 


4.3 Practical minimization 


We now consider minimization of the MDL criterion given in (13). Once the number of time 


intervals, B, and the locations of change points, tt, are specified, unique estimates of 7 and p can 
be obtained using the method described in Section Since the time intervals are independent the 
required computation is highly parallelizable. Nonetheless, minimization of ( |13[ ) is not trivial. It 
involves an expensive combinatoric optimization: for each B, there are possibilities for tt. 

(A small number of these possibilities is not considered by our optimiziation procedure because 
we impose a minimum width on each time interval.) To simplify computation, we propose a fast 


stepwise forward algorithm to approximately minimize (13), rather than attempting to globally 


minimize it. Briefly, at each step we add a single change point to the current collection by selecting 
the new change point that decreases mdlfuii(B, tt) the most. 


More precisely, the algorithm first calculates (13) for the model with no change points (i.e., only 
one time interval), that is, it computes mdli = mdlfuii(lj tt). (This does not depend on tt because 
there are no change points.) In the second step, the algorithm considers all of the possible models 
with a single change point and selects the one that minimizes mdlfuii( 2 , tt). Calling the obtained 
minimum mdl2, the change point is added to the model if mdl2 < mdli; otherwise the algorithm 
terminates. The algorithm continues in this way, attempting to add an additional change point to 


the model at each step by minimizing (13). It stops when there is no possible change point remaining 


that decreases mdlfuii(B, tt) and this current model is taken as the (approximate) minimizer of ( 13 ). 


Although this greedy algorithm may result in local minimization, it is fast and provides a good 


compromise between computational efficiency and accuracy (Lee, 2002b). The exact steps of the 
algorithm appear in Algorithm 
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Algorithm 1 A stepwise search algorithm for minimizing (13) 


Input: Wavelength-cross-time bin counts 

Description: A fast algorithm for approximately minimizing mdlfuii(i?, vr, 7 , p) 

1 \ B 1 and tt -^ () (null vector) 

2: z ^ mdlfuii(l,7r). 

3: z' ^ Z + l 
4: while z < z' do 

5: B <- B + 1 

6: z' ^ Z 

7: Tt' •(— TT 

8 : Use a grid search to find the change point that when added to the current collection, tt', 

gives the largest reduction in mdifuu. Denote this new set of change points as tt. If there is no 
possible change point to add, break the while-loop. 

9: 2 ^ mdlfuii(i?, tt) 

10: end while 

11 : Return {B — 1, tt'} 


4.4 A Monte Carlo Procedure for Testing for the Existence of Change Points 


In practice it is useful to quantify evidence for existence of the change points. For a test statistic, we 


use the change in the MDL function (13) due to introduction of change points, m* = mdlfuu(I?, tt) — 


mdlfuii(l, TTnuii), where B and tt are computed by minimizing (13), and TTnuu is the null vector 


containing no change point. Here mdlfuii(l, TTnuii) is the value of the MDL function (13) with no 


change points. We approximate the null distribution of m* via Monte Carlo using permutation. 
Specifically, we generate iVsim uniformly random permutations of {1, 2,..., J} independently and 
the corresponding replicate null datasets, (Di,..., by shuffling the time indices of the 

original dataset according to these permutations (see Step 3 of Algorithm]^. Each replicate dataset 
is then fit with the full model (including change points) according to the MDL criterion specified 
in (13) and the resulting test statistic, m* is computed (for i = 1 ,..., Asim ). Note that the MDL 


criterion (13) with no change points, mdlfuu(l, tThuIi), is invariant to the permutation and thus need 
not to be re-computed for the permuted datasets (Di,... ,'Dn^.^)- The empirical distribution of 
(m^,..., ) is a Monte Carlo approximation to the null distribution of m* and a p-value can 

be computed by comparing m* to {m ^,..., . ). Details appear in Algorithm 


5 Numerical Experiments 

We conducted a series of simulation experiments to evaluate the empirical properties of the proposed 
methodology. Binned photon counts were simulated under eight different spectral-temporal test 
functions, In order to make the experiments as realistic as possible, seven of the test 
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Algorithm 2 Monte Carlo significance test 


Input: Working data {(u)*, tj, number of simulation Ngim, significance level a 

Description: Monte Carlo test for change point 


1 : Fit the time homogenous Poisson moded (with no change point, Section]^ and full Poisson 
model (including change points, Sectionto {(rci, tj, and denote the test 

statistic m* = mdlfuii(i?, tt) — mdlfuii(l, TTnuii)- (tThuII = (), null vector) 

2 : for fe = 1 to Nsim do 

3: Generate the data set = {{wi,tj,Yip^)} where {pi,... ,pj} is a uniformly random per¬ 

mutation of {1, 2 ,...,J}. 

4: Fit the full Poisson model (including change points) using the procedure in Section 4.3 to 

T^k- 

5: Denote the resulting test statistic by 

6 : Estimate the p-value as 

#{ml > m*} + 1 


P = 


Asim + 1 


7: Conclude that there is at least one change point if p < a for some significance level, a. 


functions were constructed by fitting a spectral-temporal model, using the method described in 
Section to the seven observations of FK Com described in Section Because these test functions 
have at most two change points and because we would like to investigate the statistical properties 
of our method with more change points, an eighth test function was created by concatenating two 
of the original test functions. (I particular, those created from observation identification (ObsID) 
numbers 13251 and 12298, see Table[^) The analyses carried out here, including the binning of the 
data, is identical to that described in Sectionj^ Further details are summarized in Table[^ Together 
these eight test functions cover a wide range of possible scenarios, ranging from models with no 
change point to models with four change points. For each of the eight X{tj,'Wi), 200 data sets were 
simulated according to |^, using the real data values for 6 t, and A{wi) = Ylik=i Ak{wi). The 
proposed methodology was applied to all 8 x 200 = 1400 simulated data sets to estimate X{tj,Wi) 
for each. 

The simulation results are summarized in Table [2j The first column lists the ObsID number 
for the observations of FK Com (in chronological rather than ascending order) used to generate 
the X{tj,Wi), see Section]^ The second column shows the (true) numbers of homogeneous time 
intervals, B, in each of the X{tj,Wi)-, recall the number of change points is B — 1. The third column 
reports the percentage of the simulated datasets for which the fitted B exactly equals the true 
value. The proposed methodology achieved a correct recovery rate of at least 94% in 6 out of the 
8 simulations settings. 

Our method also accurately estimates the change points, tt, as shown in Column 4 of Table 
For those simulations where B is correctly estimated. Column 4 gives the root mean squared error 
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Table 1: Details of simulation settings. The columns report (1) the chronologically ordered (except the 
last row) ObsID numbers of the datasets used to generate each test function, (2) the number of wavelength 
bins, (3) the number of time bins, (4) the number of basis functions P, as in Equation Q, used to fit the 
corresponding FK Com observation, (5) the number of time intervals, i.e., the number of change points plus 
one, estimated from the corresponding FK Com observation, (6) vector of the numbers of emission lines with 
element j corresponding to time interval j, estimated from the corresponding FK Com observation. 


ObsID 

N 

J 

P 

B 

M 

12297 

142 

21 

34 

2 

(0, 1) 

12356 

142 

32 

34 

1 

(1) 

13251 

142 

49 

34 

3 

(0, 0, 0) 

12298 

142 

20 

34 

2 

(2, 1) 

13259 

142 

23 

34 

2 

(0, 0) 

12357 

142 

18 

34 

1 

(1) 

12299 

142 

20 

34 

3 

(0, 0, 0) 

13251,12298 

142 

69 

34 

5 

(0, 0, 0, 2, 1) 


Table 2: Simulation results. The columns report (1) the chronologically ordered (except the last row) 
ObsID numbers of the datasets used to generate each test function, (2) the true number of time intervals, 
i.e., the true number of change points plus one, (3) the percentage of the fitted B that exactly equal the true 
S, and (4) the root mean square error of tt among those simulations with B = B. 


ObsID 

B 

% correct B 

RMSE(fi-) 

12297 

2 

94 

0.70 

12356 

1 

100 

0 

13251 

3 

98 

0.66 

12298 

2 

98 

0.29 

13259 

2 

80 

0.69 

12357 

1 

100 

0 

12299 

3 

100 

0 

13251,12298 

5 

85 

0.66 


(RMSE) of TT, in terms of the time bin width {5t = 2000 seconds in all cases). The RMSE is less 
than one bin width in all eight cases. Figure displays histograms of the estimated change point 
location among those simulations for which B = B under the five test functions with less than 
a 100% recovery rate for tt. In all five cases, the estimated change point locations are narrowly 
centered around the true change points. Finally, for those simulations for which B = B and tt = tt, 
the uncertainty of / is summarized in Figure Here we only present results for one test function 
(ObsID 12299); plots for the other test functions appear in Appendix [Cj For each of the B = 3 
homogeneous time intervals associated with ObsID 12299, we overlay the fitted / of the simulations 
for which B = B and tt = tt. The results are plotted in panels (a)-(c) of Figure]^ The relatively 
high uncertainty at high wavelengths is partly due to the small effective area at these wavelengths; 
see panel (d) of Figure]^ 
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Figure 1: Histograms of detected change point locations in the simulation study. The title of each panel 
refers to the ObsIDs given in Tables [T] andThe widths of the histogram bins is the same as the time binning 
of the data; bins containing the true change-points locations are represented in black. The horizontal axis is 
elapsed spacecraft time and the panel corresponding to the concatenation of ObsID 13251 and ObsID 12298 
has a broken x-axis because there is a gap in spacecraft time between the two observations. 


6 Application to X-ray Observations of FK Com 


FK Com is an evolved yellow giant star (spectral type G4 III) with a mass of about 8.4 times that 
of the sun and at a distance of 550 ± 60 light-years. Typically, late-spectral-type giants are slow 
rotators, and thus do not maintain a strong magnetic dynamo that can sustain a high-temperature 
corona. However, FKCom is an extremely fast rotator, with equatorial velocities of 179 km/s, and 


a rotational period of 2.4 days (Korhonen et al. 1999 Strassmeier, 2009). (For comparison, the Sun 
rotates at about 2 km/s at its equator.) Giant stars are formed when a sun-like star depletes the 
hydrogen at its core and thus transitions from producing energy though the fusion of hydrogen into 
helium to the fusion of heavier elements. Typical sun-like progenitor stars would require rotational 
speeds that would tear them apart in order to form a giant star with such a rapid rate of spin. 
Thus, the favored hypothesis for the formation of FK Com-like stars is that they are the result of a 
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(a) (b) 





Figure 2: Panels (a) - (c): Overlay plots of the / fitted to the 200 simulations generated with the test 
function corresponding to ObsID 12299. Notice that all 200 simulations resulted \n B = B and tt = tt. 
The three panels represent the B = 3 homogeneous time intervals of ObsID 12299. The areas between the 
/ computed with each simulated dataset and the underlying test function are colored and overlaid. Darker 
color represents more overlap of these areas and thus summarizes the concentration of the the fitted spectra 
around the underlying test function. The solid black line represents the smooth component, /, of test 
function 12299 and the dashed lines represent the 5% and 95% (pointwise) percentiles of /. Similar plots for 
the other test functions appear in AppendixPanel (d): Plot of the effective area ^k{w) curve for 

test function 12299. 
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binary star system that has coalesced through a process of mass transfer, dramatically speeding up 
the stellar rotation in the process ( ]Bopp and Sten^ 1981). If this is the case, the high rotation 
should induce a strong dynamo and consequently a dynamic corona, filled with high-temperature 
plasma and organized into loop-like structures via magnetic fields. 

FK Com exhibits long duration spots—akin to sunspots—and flares associated with these spots. 
Optical Doppler imaging observations of FK Com demonstrate the existence of persistent spot-like 


features (Elstner and Korhonen, 2005) that remain localized to specific longitudes for long timescales 


(>months). Analysis of an X-ray flare observed with the XMM-JVewton observatory showed that 


the flare is tied to these spots ( 

Drake et al. 

2008 

). Numerical MHD modeling of the stellar wind 

and the magnetic structure ( 

Cohen et al. 

2010 

) shows a highly complex coronal environment, with 


the most notable feature being even putatively stable coronal loops “wrapping around” due to the 
rotation. Thus, even though the corona appears to be solar-like in its origin, it is an extreme case of 
such coronae, and is therefore of considerable interest to both theoretical and observational studies. 

A long-duration, high-resolution X-ray observation of FK Com was carried out with the Chandra 
X-ray Observatory over seven discontinuous time periods in April 2011. The data show that the 
source brightness is constantly fluctuating with occasional periods of singular flaring activity, see 
Figures!^ and 1^ As a test case for our method, we apply it to these seven Chandra observations of 
FKCom. Each was preprocessed by discarding photons with wavelength less than 1.65 or greater 
than 31 Angstroms. The natural resolution of Chandra data of this sort is extremely high, resulting 
in a very large, but very sparse table of photon counts; typically there is at most one photon recorded 
in each wavelength-cross-time bin. We can obtain substantial computational savings with only a 
modest sacrifice of scientific information by reducing the resolution of the data. Massive events 
in the corona of EK Com, for example, require appreciable time and are associated with dramatic 
spectral shifts. Change points typically appear with frequencies on the order of tens of thousands 
of seconds, whereas the inherent bin width is about 3 seconds. In our analyses, we reduce this 
resolution and use temporal bins of width 6 t = 2000 seconds. Similarly, the inherent wavelength 
bin width is 0.005 Angstroms and we set 6 w = 0.2 Angstroms. Though a substantial reduction in 
spectral resolution, our analysis is much higher resolution than the standard hardness ratio analysis, 
which can be viewed as using two or three wavelength bins of width 10-15 Angstroms. Our fitted 
X{tj,Wi) are plotted in Eigurej^ 

Table 3: Monte Carlo p-values for the test for change points in each of the seven observations of FK Com 


ObsID 12297 12356 13251 12298 
p-value 0.12 1.00 <0.01 0.01 


13259 12357 12299 

0.14 1.00 <0.01 


We also applied the Monte Carlo test for change points as described in Section 4.4 the resulting 
p-values for the seven datasets appear in Table There is no evidence for change points in two of 
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Time [x1 0® sec] 

Figure 3: The phase-time light curve of FKCom. The source brightness (i.e., light curve) over the course 
of the Chandra observation is shown during periods when the source was observed. The horizontal axis 
represents spacecraft elapsed time, and the vertical axis represents the rotational phase of the star. The 
tilted red lines show the phase-time correspondence for each observation, showing the time and rotational 
phase coverage of the full dataset. Overlaid on the phase-time line, displayed as a deviation from it, are 
sparklines depicting the smoothed light curves (green curves). The segments are labeled by the ObsID 
numbers. This representation of the light curve shows that there is no obvious rotational signature in the 
light curve, and the variability is stochastic. Notice the large flare that occurs in ObsID 12299: our analysis 
shows that the spectrum varies significantly during this flare, see Figure [5| 


the observations (ObsID 12356 and 12357), very weak evidence for two (ObsID 12297 and 13259), 
and strong evidence for the other three. This agrees with the htted models illustrated in Figure 
but quantifies the degree of evidence for the fitted change points. Finally, the two test functions that 
resulted in relatively low recovery rates of B in the simulations described in Section correspond 
to the two data sets with the weakest evidence for the a change point, compare ObsID 12297 and 
13259 in Table [^(% correct B) with Table(p-value). 

Although we hnd two or fewer change points in each observation, the number was not limited 
in our analyses. Operational constraints limit the continuous observation time available with the 
space-based Chandra X-ray Observatory. The change points that we observe stem from massive 
energetic shifts in the cornea of FK Com and take time to evolve. Keeping in mind that a single 
massive shift of this sort on the Sun would likely destroy human civilization, observing a total of 
seven in one (discontinuous) observation period spanning less than a fortnight (see Figure]^ is not 
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Figure 4: Observed data. Each point represents a photon detected with HETGS+ACIS-S configuration 
of Chandra (including both low- and high-energy gratings data). The wavelengths are given in Angstroms, 
and the time is in spacecraft clock seconds. The ObsIDs of each segment are marked to the right of the 
red vertical line representing the starting time of the observation segment. The collected data are sparse, 
but some changes in the density of points can be discerned even by eye, indicating temporal and spectral 
variations. 


a small number. 

This analysis indicates that large changes in intensities are accompanied by significant changes 
in the spectrum of the source. This is readily apparent in the results for ObsID 12299, which 
shows the observation separated into three segments with different intensities and spectral shapes; 
see ObsID 12299 in Figure and row 7 of Figure The spectrum changes to being dominated 
by high-energy photons due to an increase in high-temperature plasma when the flare is set off. 
As the flare decays, even when the observed intensity is relatively high, the spectrum can be seen 
returning to the pre-flare state, though with enhanced emission over the line-dominated region 
around 15 Angstroms. 

From a theoretical point of view, this is not an obvious result, since for active stars like FK Com, 
it is believed that even the apparently quiescent corona emits X-rays as a superposition of a large 


number of weak flares (see, e.g., Kashyap et al. 2002; Giidel 2004). However, our analysis favors the 
idea that hot plasma could become trapped in stretched field lines wrapping around the star, and 
thus cool radiatively, and flares are only triggered sporadically when the magnetic strain becomes 
too large to be sustained. While a detailed modeling and interpretation of this picture is beyond 
the scope of this paper, we point out that even a simple application of a statistically principled 
method is enough to obtain an important result: The quiescent corona of FKCom is unlike the 
flaring corona, and in this sense is much more solar-like than previously anticipated. 
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Figure 5: Fitted X{tj,Wi) for seven Chandra observations of FKCom. The ObsID of each dataset appears 
in the title of each panel. The left plot in each row shows the fitted spectra for each identified time interval; 
the solid black, dashed red, and dotted green curves correspond to the first, second, and third time intervals, 
respectively. The right plot in each row is a heat maps that represents the best fit values of X{tj,Wi). Time 
is given in elapsed spacecraft time. 
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7 Discussion 


This article develops a novel approach for modeling change points in a class of marked Poisson 
processes, specifically the time-varying spectra of high-energy astrophysical sources. The approach 
includes a family of change point models, a practical algorithm for estimation, and an MDL crite¬ 
rion for selecting the best fitting model. Despite numerous reported applications where the MDL 
principle leads to reliable model selection criteria, its use in the statistical literature remains rather 
sparse. It is our hope that the present work, in which the MDL principle is applied to solve a 
complicated model selection problem that involves the “large p small n” scenario, will encourage 
and stimulate renewed research interests in this useful principle among statisticians. 

Although the methods developed in this article are motivated by the search for change points in 
high-energy time-varying spectra, it is possible to extend them to include spatial variations as well. 
Typically, astrophysical images include sources at many different scales, ranging from unresolved 
point sources to complex structures that may span the field of view. Owing to instrumental point- 
spread functions, point sources in close proximity may overlap and in many cases are overlaid on 
larger scale structures. An extension that can account for change points in the spatial plane or in the 
spatial cross spectral hyperplane would be quite useful in astronomical data analysis. Statistically, 
this involves extending the dimension of the marks in the Poisson process from the one dimensional 
photon wavelengths to two dimension spatial locations or three dimensional spatial cross spectral 
measurements. This would, of course, involve extensions of the non-parametric models developed 
here for spectra to higher dimensional models for images or joint models for images and spectra. 
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A Derivation of the MDL criterion 

This appendix derives Q, and it begins with the calculation of the term CL(At) in Q. In Section]^ 
every candidate model Ai can be uniquely identified by a 0 and thus CL(At) = CL(0). Now to 
completely describe an 0 = could first specify which elements are nonzero, and then 

specify the actual values of these nonzero elements. As there are (||J||g) ways of arranging ||/3||o 
nonzero Pj’s in P “slots”, we need log 2 (||d||o) specify which /3j’s are nonzero. Similarly we 

need log 2 (||^||(,) bits to specify which rjj^s are nonzero. 

Now for the actual values of the nonzero elements in 0. As they are penalized maximum 
likelihood estimates, the approximate effective codelength for each of them is ^log 2 (NJ) (e.g., 
, giving the total codelength for all nonzero elements as ^ ||0||o log 2 (NJ). Therefore 

CL(A1) = CL(0) = l||0||olog2(iVJ) + log2 +log2 (^11^1 J. 

Since in practice N ^ P so the second term is ignored, giving our final expression for CL(Ad): 


Rissanen 


1989 


1 - / N 

CL(>f) = -||0||olog2(NJ) +log 2 ( II .,|| 

2 VII 0 IIO 

We remark that in many classical applications of two-part MDL where the number of possible 
parameters is small when compared to the number of data points, a penalty term similar to the 
last term in the above expression is typically ignored. However, for the present problem the number 
of potential parameters P + N is not ignorable when compared to the number of data points NJ 
(where J could be small due to change points, see Section]^. Failing to consider this last term will 
lead to an overfitted model. This last term shares the same role as the additional penalty term in 


the Extended Bayesian Information Criterion (EBIC) proposed by Chen and Chen (2008). These 


authors have shown that the traditional BIC fails in the so-called “large p small n” scenario and 
an additional penalty term is required to guarantee consistency properties. 

( jl989 ) that it 


Lastly we need to calculate the term CL()D|Wf), which has been shown by 


Rissanen 


is given by the negative of the log likelihood. In our case this gives CL(P|A4) = — X]/=i ^oneiXij] 0) 

with base 2 logarithm. Combining the expressions for CL(Ad) and CL(2?|AI) and changing the base 
of the logarithm from 2 to e, we obtain ([^. 
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Figure 6: Similar to Figure§but for test functions 12297 (77%), 12356 (100%), 13251 (69%), 12298 (90%), 
13259 (62%), and 12357 (100%). Results are only plotted for simulations with B = B and n = tv] the 
proportions of simulations for which B = B and tt = tt shown in the parentheses. Plots for the concatenated 
test function are not reported. 


B Derivation of the MDL criterion (13) 


This appendix derives the MDL criterion (13). As before, we need to calculate CL(iD) = CL(Ad) + 
CL(iD|At). For model (10), the codelength CL(Ad) for any candidate model A4 can be decomposed 
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into 

CL(;W) = CL(B) + CI{7v\B) + CI{Q\B,7v) 

Since B is an integer, its codelength is CL(S) = log 2 -B. Now for the codelength of tt. Recall that 
Cfe is the length of the time segment b. Therefore knowing the values of (ci,..., cb) is equivalent 
to knowing the values of all change points, (vri,..., itb-i) ■ Thus it suffices to encode (ci,..., cb), 
and because they are integers, we have 

B 

CI{tv\B) = ^log2Pb- 

b=l 

For the last term, we use the same arguments in Appendix and obtain 

B B 

CL(0|R,7r) = J]CL(0,|R,7r) = 

6=1 6=1 



Finally, CL('D|Ad) is given by the negative of the log likelihood (base 2) of the candidate model 
being considered, which gives 


N j B 

QL{V\M) = EEE 

1=1 j=l 6=1 


with base 2 logarithm. Changing the base of the logarithm terms, we obtain (13). 


C Supplement to Section 

Overlay plots of the fitted / to the 200 simulations generated with the test functions corresponding 
to ObsID 12297, 12298, 12356,12357, 13251, and 13259 are shown in Figure [^These test functions 
share the same effective area curve depicted in Figure j^d). 
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